Automatic target selection for structural genomics on eukaryotes.

نویسندگان

  • Jinfeng Liu
  • Hedi Hegyi
  • Thomas B Acton
  • Gaetano T Montelione
  • Burkhard Rost
چکیده

A central goal of structural genomics is to experimentally determine representative structures for all protein families. At least 14 structural genomics pilot projects are currently investigating the feasibility of high-throughput structure determination; the National Institutes of Health funded nine of these in the United States. Initiatives differ in the particular subset of "all families" on which they focus. At the NorthEast Structural Genomics consortium (NESG), we target eukaryotic protein domain families. The automatic target selection procedure has three aims: 1) identify all protein domain families from currently five entirely sequenced eukaryotic target organisms based on their sequence homology, 2) discard those families that can be modeled on the basis of structural information already present in the PDB, and 3) target representatives of the remaining families for structure determination. To guarantee that all members of one family share a common foldlike region, we had to begin by dissecting proteins into structural domain-like regions before clustering. Our hierarchical approach, CHOP, utilizing homology to PrISM, Pfam-A, and SWISS-PROT chopped the 103,796 eukaryotic proteins/ORFs into 247,222 fragments. Of these fragments, 122,999 appeared suitable targets that were grouped into >27,000 singletons and >18,000 multifragment clusters. Thus, our results suggested that it might be necessary to determine >40,000 structures to minimally cover the subset of five eukaryotic proteomes.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Target space for structural genomics revisited

MOTIVATION Structural genomics eventually aims at determining structures for all proteins. However, in the beginning experimentalists are likely to focus on globular proteins to achieve a rapid basic coverage of protein sequence space. How many proteins will structural genomics have to target? How many proteins will be excluded since we already have structural information for these or since the...

متن کامل

Implications of structural genomics target selection strategies: Pfam5000, whole genome, and random approaches.

Structural genomics is an international effort to determine the three-dimensional shapes of all important biological macromolecules, with a primary focus on proteins. Target proteins should be selected according to a strategy that is medically and biologically relevant, of good value, and tractable. As an option to consider, we present the "Pfam5000" strategy, which involves selecting the 5000 ...

متن کامل

Target selection for structural genomics based on ProtoNet classification

Motivation: Structural genomics projects aim to solve a large number of protein structures to eventually represent the entire protein space. To this end it is necessary to increase the rate at which new families, superfamilies and folds are discovered. To facilitate that, strategies to improve the selection of targets for structural determination are needed. An important component in the design...

متن کامل

Detecting subtle functional differences in ketopantoate reductase and related enzymes using a rule-based approach with sequence-structure homology recognition scores.

Ketopatoate reductase (KPR) is the second enzyme in the pantothenate (vitamin B(5)) biosynthesis pathway, an essential metabolic pathway identified as a potential target for new antimicrobials. The sequence similarity among putative KPRs is limited and KPR itself belongs to a large superfamily of 6-phosphogluconate dehydrogenases. Therefore, it is necessary to discriminate between true and othe...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Proteins

دوره 56 2  شماره 

صفحات  -

تاریخ انتشار 2004